Scientific Python antipatterns advent calendar day seven

For today, a simple error that it easy to fall into. As a reminder, I’ll post one tiny example per day with the intention that they should only take a couple of minutes to read.

If you want to read them all but can’t be bothered checking this website each day, sign up for the mailing list:

Sign up for the mailing list

and I’ll send a single email at the end with links to them all.

Iterating over dictionaries without using items

In Python we use dictionaries when we want to be able to efficiently look up the value associated with a key:

# create a dictionary
emails = {
    'Martin' : 'martin@pythonforbiologists.com', # only this one is real!
    'Jess'   : 'jessica.rodriguez@gmail.com',
    'Phil'   : 'p.smith@outlook.com'
}

# look up an email
emails['Jess']
'jessica.rodriguez@gmail.com'

Sometimes we need to do something to all the key/value pairs in a dictionary. If we’re not sure how to do this, the obvious thing to try is iterating over the dictionary:

for x in emails:
    print(x)
Martin
Jess
Phil

From the output, we will quickly realise that iterating over a dictionary gives us the keys, so we can add a step to get the matching values:

for name in emails:
    email_address = emails[name]
    print(name, email_address, sep='\t')
Martin  martin@pythonforbiologists.com
Jess    jessica.rodriguez@gmail.com
Phil    p.smith@outlook.com

This works, but is harder than it needs to be. Whenever you see this pattern - a loop that iterates over keys, then inside the loop a line that gets the matching value - we can replace it with a call to items:

emails.items()
dict_items([('Martin', 'martin@pythonforbiologists.com'), ('Jess', 'jessica.rodriguez@gmail.com'), ('Phil', 'p.smith@outlook.com')])

The items method gives us a list where each elemnt is a tuple containing the key and the value. So we can set up the loop in a single line:

for name, email_address in emails.items():
    print(name, email_address, sep='\t')
Martin  martin@pythonforbiologists.com
Jess    jessica.rodriguez@gmail.com
Phil    p.smith@outlook.com

which normally makes the body of the loop clearer. Occasionally you might see this:

for item in emails.items():
    name, email_address = item
    print(name, email_address, sep='\t')
Martin  martin@pythonforbiologists.com
Jess    jessica.rodriguez@gmail.com
Phil    p.smith@outlook.com

or even this:

for item in emails.items():
    name          = item[0]
    email_address = item[1]
    print(name, email_address, sep='\t')
Martin  martin@pythonforbiologists.com
Jess    jessica.rodriguez@gmail.com
Phil    p.smith@outlook.com

but both are more clumly than doing the unpacking in a loop.

Bonus: it’s only necessary to use a dictionary if we want to be able to loop up individual keys. If we find ourselves building a dictionary, but only ever using it in a loop, then it should probably be a list of tuples instead:

emails = [
    ('Martin','martin@pythonforbiologists.com'), # only this one is real!
    ('Jess','jessica.rodriguez@gmail.com'),
    ('Phil','p.smith@outlook.com')
]

which will be even easier to iterate over:

for name, email_address in emails:
    print(name, email_address, sep='\t')
Martin  martin@pythonforbiologists.com
Jess    jessica.rodriguez@gmail.com
Phil    p.smith@outlook.com

and much more memory efficient.

One more time; if you want to see the rest of these little write-ups, sign up for the mailing list:

Sign up for the mailing list